Add a memory bound FileStatisticsCache for the Listing Table#20047
Add a memory bound FileStatisticsCache for the Listing Table#20047mkleen wants to merge 49 commits intoapache:mainfrom
Conversation
a66420a to
3b33739
Compare
3b33739 to
8e5560b
Compare
e273afc to
b297378
Compare
59c6bce to
4542db8
Compare
|
@kosiew Thank you for the feedback! |
|
@kosiew Anything else needed to get this merged? Another approval maybe? |
205f96c to
92899a7
Compare
| impl<T: DFHeapSize> DFHeapSize for Arc<T> { | ||
| fn heap_size(&self) -> usize { | ||
| // Arc stores weak and strong counts on the heap alongside an instance of T | ||
| 2 * size_of::<usize>() + size_of::<T>() + self.as_ref().heap_size() |
There was a problem hiding this comment.
This won't be accurate.
let a1 = Arc::new(vec![1, 2, 3]);
let a2 = a1.clone();
let a3 = a1.clone();
let a4 = a3.clone();
// this should be true because all `a`s point to the same object in memory
// but the current implementation does not detect this and counts them separately
assert_eq!(a4.heap_size(), a1.heap_size() + a2.heap_size() + a3.heap_size() + a4.heap_size());The only solution I imagine is the caller to keep track of the pointer addresses which have been "sized" and ignore any Arc's which point to an address which has been "sized" earlier.
There was a problem hiding this comment.
Good catch! I took this implementation from https://github.com/apache/arrow-rs/blob/main/parquet/src/file/metadata/memory.rs#L97-L102 . I would suggest to also do a follow-up here. We are planing anyway to restructure the whole heap size estimation.
|
@martin-g Thanks for this great review! I am on it. |
92899a7 to
2e3aff9
Compare
365cb2f to
9b48bd8
Compare
|
@adriangb Could please do another benchmark run? I have no regressions anymore running |
|
run benchmark clickbench_partitioned |
@Dandandan Thank you, i broke the linter. It's fixed now. Does this affect the benchmark run? |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing file-stats-cache (3b64005) to 244f891 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
This looks much better now: There are a few minor regressions which could be just noise. Could we do a benchmark second run? |
|
@kosiew I added all the changes you requested. Could you do another review please? |
|
run benchmark clickbench_partitioned |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing file-stats-cache (3b64005) to 244f891 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
@mkleen you should be able to trigger benchmark runs now fwiw |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
Ok, the benchmarks are looking good now again. At the moment we have 20 mb default cache size, with 10 mb the stats for clickhouse_partitioned did not fit yet. I wonder what would be a reasonable default value? |
kosiew
left a comment
There was a problem hiding this comment.
Thanks for the updates here, this is shaping up nicely.
I still have a couple of concerns.
Some previously noted issues appear to be intentionally deferred, like the Arc heap size double-counting and the empty string parsing case. Could you clarify whether those are planned for a follow-up PR?
I also ran into two additional issues while reviewing the final diff, including what looks like a functional regression in the reader path cache wiring. Because of that, I am not yet confident that the branch is free of new problems.
Would love your thoughts on the points below.
| let table = ListingTable::try_new(config)?.with_definition(sql_definition); | ||
| let table = ListingTable::try_new(config)? | ||
| .with_definition(sql_definition) | ||
| .with_cache(self.runtime_env().cache_manager.get_file_statistic_cache()); |
There was a problem hiding this comment.
I noticed that only the registered-listing-table path was updated to call .with_cache(...). Was it intentional that the generic reader path in _read_type() still uses ListingTable::try_new(config)? without attaching the runtime cache?
With ListingTable now defaulting collected_statistics to None, could this mean that SessionContext::read_parquet, read_csv, read_json, read_avro, and similar methods no longer benefit from the file statistics cache?
If so, would this be considered a regression in the default-cache rollout? How do you see this affecting users who rely on those reader paths?
There was a problem hiding this comment.
No this was not intentional. This needs to be fixed. Thanks a lot!
| Some("50M".to_owned()), | ||
| Some("1M".to_owned()), | ||
| None, | ||
| Some("1M".to_owned()), |
There was a problem hiding this comment.
I see that DEFAULT_FILE_STATISTICS_MEMORY_LIMIT was increased to 20 MiB, but RuntimeEnvBuilder::entries() still hard-codes Some("1M".to_owned()) for datafusion.runtime.file_statistics_cache_limit.
Would it make sense to update entries() to match the new default?
Which issue does this PR close?
This change introduces a default
FileStatisticsCacheimplementation for the Listing-Table with a size limit, implementing the following steps following #19052 (comment) :Add heap size estimation for file statistics and the relevant data types used in caching (This is temporary until Add a crate for HeapSize trait arrow-rs#9138 is resolved)
Redesign
DefaultFileStatisticsCacheto use aLruQueueto make it memory-bound following Adds memory-bound DefaultListFilesCache #18855Introduce a size limit and use it together with the heap-size to limit the memory usage of the cache
Move
FileStatisticsCachecreation intoCacheManager, making it session-scoped and shared across statements and tables.Disable caching in some of the SQL-logic tests where the change altered the output result, because the cache is now session-scoped and not query-scoped anymore.
Closes Add a default
FileStatisticsCacheimplementation for theListingTable#19217Closes Add limit to
DefaultFileStatisticsCache#19052Rationale for this change
See above.
What changes are included in this PR?
See above.
Are these changes tested?
Yes.
Are there any user-facing changes?
A new runtime setting
datafusion.runtime.file_statistics.cache_limit